Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Biotechnol ; 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38383603

RESUMO

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.

2.
Nucleic Acids Res ; 52(D1): D513-D521, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962356

RESUMO

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.


Assuntos
Bases de Dados Genéticas , Ecossistema , Genoma , Proteoma , Genoma/genética , Filogenia , Sintenia , Internet , Ordem dos Genes/genética
3.
Nucleic Acids Res ; 49(D1): D373-D379, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33174605

RESUMO

OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.


Assuntos
Algoritmos , Bases de Dados Genéticas , Ordem dos Genes/genética , Genoma/genética , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Mapeamento Cromossômico , Evolução Molecular , Ontologia Genética , Humanos , Internet , Pandemias , Filogenia , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Especificidade da Espécie , Sintenia
4.
Bioinformatics ; 35(14): 2504-2506, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30508066

RESUMO

SUMMARY: The evolutionary history of gene families can be complex due to duplications and losses. This complexity is compounded by the large number of genomes simultaneously considered in contemporary comparative genomic analyses. As provided by several orthology databases, hierarchical orthologous groups (HOGs) are sets of genes that are inferred to have descended from a common ancestral gene within a species clade. This implies that the set of HOGs defined for a particular clade correspond to the ancestral genes found in its last common ancestor. Furthermore, by keeping track of HOG composition along the species tree, it is possible to infer the emergence, duplications and losses of genes within a gene family of interest. However, the lack of tools to manipulate and analyse HOGs has made it difficult to extract, display and interpret this type of information. To address this, we introduce interactive HOG analysis method, an interactive JavaScript widget to visualize and explore gene family history encoded in HOGs and python HOG analysis method, a python library for programmatic processing of genes families. These complementary open source tools greatly ease adoption of HOGs as a scalable and interpretable concept to relate genes across multiple species. AVAILABILITY AND IMPLEMENTATION: iHam's code is available at https://github.com/DessimozLab/iHam or can be loaded dynamically. pyHam's code is available at https://github.com/DessimozLab/pyHam and or via the pip package 'pyham'.


Assuntos
Software , Evolução Biológica , Genoma
5.
Bioinformatics ; 35(7): 1159-1166, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30184069

RESUMO

MOTIVATION: As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs. RESULTS: In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations. AVAILABILITY AND IMPLEMENTATION: An open source software tool is available at https://github.com/DessimozLab/esprit2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Triticum , Genoma de Planta , Anotação de Sequência Molecular , Filogenia , Software
6.
Nucleic Acids Res ; 46(D1): D477-D485, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29106550

RESUMO

The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org.


Assuntos
Evolução Biológica , Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Proteínas/genética , Sintenia , Algoritmos , Animais , Archaea/classificação , Archaea/genética , Archaea/metabolismo , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biologia Computacional/métodos , Fungos/classificação , Fungos/genética , Fungos/metabolismo , Ontologia Genética , Humanos , Internet , Filogenia , Plantas/classificação , Plantas/genética , Plantas/metabolismo , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Navegador
7.
Bioinformatics ; 33(14): i75-i82, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881964

RESUMO

MOTIVATION: Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is necessary to improve the scalability and robustness of orthology inference methods. RESULTS: We present improvements in the OMA algorithm: (i) refining the pairwise orthology inference step to account for same-species paralogs evolving at different rates, and (ii) minimizing errors in the pairwise orthology verification step by testing the consistency of pairwise distance estimates, which can be problematic in the presence of fragmentary sequences. In addition we introduce a more scalable procedure for hierarchical orthologous group (HOG) clustering, which are several orders of magnitude faster on large datasets. Using the Quest for Orthologs consortium orthology benchmark service, we show that these changes translate into substantial improvement on multiple empirical datasets. AVAILABILITY AND IMPLEMENTATION: This new OMA 2.0 algorithm is used in the OMA database ( http://omabrowser.org ) from the March 2017 release onwards, and can be run on custom genomes using OMA standalone version 2.0 and above ( http://omabrowser.org/standalone ). CONTACT: christophe.dessimoz@unil.ch or adrian.altenhoff@inf.ethz.ch.


Assuntos
Evolução Molecular , Genômica/métodos , Taxa de Mutação , Filogenia , Software , Algoritmos , Animais , Humanos , Mamíferos/genética
8.
Nat Methods ; 13(5): 425-30, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Assuntos
Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie
9.
Nucleic Acids Res ; 43(Database issue): D240-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25399418

RESUMO

The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.


Assuntos
Bases de Dados de Proteínas , Proteínas de Plantas/genética , Proteoma/química , Homologia de Sequência de Aminoácidos , Algoritmos , Ontologia Genética , Genoma de Planta , Humanos , Internet , Proteínas de Plantas/química , Proteoma/genética , Sintenia , Triticum/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...